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Abstract 

In this paper, we try to answer two questions about any given seientifie discipline: First, how important 
is each subfield and second, how does a specific subfield influence ofher subfields? We modify fhe well- 
known open-sysfem Leonfief Inpuf-Oufpuf Analysis in economics info a closed-sysfem analysis focusing 
on eigenvalues and eigenvectors and fhe effecfs of removing one subfield. We apply fhis mefhod fo fhe 
subfields of physics. This analysis has yielded some promising resulfs for idenfifying imporfanf subfields 
(for example fhe field of sfafisfical physics has large influence while if is nof among fhe largesf subfields) 
and describing fheir influences on each ofher (for example fhe subfield of mechanical confrol of afoms 
is nof among fhe largesf subfields cifed by quanfum mechanics, buf our analysis suggesfs fhaf fhese 
fields are sfrongly connecfed). This mefhod is pofenfially applicable fo more general sysfems fhaf have 
inpuf-oufpuf relafions among fheir elemenfs. 
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National science funding agencies and science policymakers often have to decide on which 
science or technology fields a nation will prioritize for a period of time. To answer this ques¬ 
tion, the funding agencies need to assess the (future) relative importance of all scientific fields. 
Furthermore, once the target, i.e. the prioritized field, is chosen, the question of which other 
fields support the target field becomes an important consideration. 

These two questions are relevant not only to policymakers and committees in such agencies, 
but also to individual scientists, academic committees and university departments. Of course, 
one can apply peer review, relying on the opinions, feelings and visions of individual experts. 
With the rise of the era of big data, a natural question is whether technical analyses using large 
collections of published patents and research articles can help answer such questions. 

The question of the relative importance of and influences between scientific fields has not 
yet been answered completely, admitting that investigating connections between scientific fields 
and technological sectors is one of the areas of investigation in the field of scientometrics[l, 2]. 
In [1], fhe Japan Science and Technology Agency (JST) was interested in knowing, for a given 
sector of patents, which scientific fields have been the primary sources of published information. 
The simple approach used in [1] is to calculate how journal articles cited in a specific sector of 
patents are distributed across all scientific fields. In [2], the authors were more focused on 
how the patterns of citation between patents and scientific publications changed due to national 
origin and over time. Such analyses based on directly counting the number of articles, patents 
and citations, are referred to as direct analyses. In this simple, direct statistical approach, an 
indirect contribution from scientific fields to sectors of patents is missing: If there is one sector 
of patents T^, which heavily relies on one scientific field Si, which in turn makes use of concepts 
and techniques from another scientific field Sj, then it is clear that even if there are no direct 
citations from to Sj, Sj is a major contributor to T^. These connections are referred as 
indirect connections. They are the main topics of this investigation. 

This idea of considering direct as well as indirect relations, though straightforward, can not 
be underestimated. Results of such approaches are sometimes described as network effects[3]. 
In Fig. I A, we illustrate an example of a citation relationship between scientific fields in which, 
indirect connections (between node 1 and node 4 or node 1 and node 3) could in principle play 
a more important role than direct ones, due to the lack of a direct connection between nodes 
1 and 4 and a weak connection between nodes 1 and 3. While network science researchers, 
including those from social network analysis, have often used this perspective[4], the network 
perspective is not yet a commonplace in scientometrics. This remark does not imply that scien- 
tometricians have not valued the network perspective [3]. Indeed, the network effect is the key 
idea behind Google’s PageRank algorithm[5] and its scientific predecessor, the Pinski-Narin 
influence methodology[6, 7]. The PageRank algorithm has been used to measure the relative 
importance of journals[8] and articles[9, 10]. 

Now that our work has been placed in its proper context, we first note that we will focus on 
scientific fields instead of journals and articles. Therefore, we may naively adopt the PageRank 
algorithm or equivalently the Pinski-Narin influence methodology for our study, by classifying 


2 



publications into scientific fields. 

However, our interest goes beyond a measure of relative importance. We also want to know 
which fields support or are supported by a given field. Therefore, we consider the Leontief 
Input-Output Analysis (LIOA) in economics[ll, 12]. LIOA is a method of answering similar 
questions about economic sectors. In fact, the similarity between the ideas and motivations 
behind LIOA and PageRank has previously been described by Franceschet[7]. In LIOA, one 
starts with a direct input-output matrix B, where 6* represents the number (or monetary value) 
of product i required for producing one product j. Sector N, the last sector, is reserved for 
final consumers, so 6)y refers to the number (or value) of products from sector i used per final 
consumer. This sector is also called final demands. Two typical questions in LIOA are as 
follows: First, what happens if the final demand increases? How will the total output of the 
other sectors change to match an increment in the demand for certain products; Second, which 
economic sector is the most important for the whole economy? What are the effects of removing 
one sector, e.g. sector i, from the economy, on each of the other sectors in the economy? The 
former is usually discussed in terms of the Leontief inverse[ll], a solution to a specific linear 
equation while the latter is often discussed in terms of the so-called Hypothetical Extraction 
Method (HEM)[13]. Roughly speaking, in HEM people compare various quantities calculated 
in the complete EIOA and in the EIOA without sector i. In this way, if there is a large change in 
one of the quantities, e.g. sector j’s output, sector i is regarded as important for and especially 
influential on sector j. 

Because these two questions concerning the relative importance of industrial sectors and 
their interrelations, such as the effect of changes in the output of product i on product j, are 
very close to what we are interested in, we use the ideas of EIOA for the present study. To 
do so, we need to define an input-output matrix B based on the citation relationships between 
scientific fields. Entries in B could be, for example, the ratio between the number of citations 
from field j to field i and the total number of citations received by field j. In a sense, this ratio 
stands for the number of citations of papers in i required for producing a citation in j. This 
provides a close parallel between EIOA and the problem we intend to study. 

However, as we will show furtheron, this approach is not as straightforward as it may seem. 
New concepts and techniques are required to make EIOA applicable to study the scientometric 
problems that we are interested in. The key difference is that EIOA is performed on an open 
system, but the system of scientific fields is a closed system. There is not a natural external 
sector paralleling the final demand sector in economics unless, perhaps, if one includes patents. 
This would be a further step requiring more data than what we have at the moment. Thus, we 
need an input-output analysis method for closed systems. Eurthermore, the number of citations 
is not a conserved quantity in the production of scientific works: the total number of citations 
received by a field is often not the same as the number of citations initiated from the field. 

Eortunately, as we illustrate later, eigenvalues, which are the basis of our definition of Input- 
Output Eactor (lOE), and eigenvectors, which are the basis of our definition of Input-Output 
Influence (lOI), are the key concepts we need for our closed-system input-output analysis. 
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This relates our method to the PageRank algorithm or, equivalently, the Pinski-Narin influ¬ 
ence methodology. Therefore, the method developed in this study - - an extension of LIOA for 
a closed system - can also be regarded as an extension of the PageRank algorithm that makes it 
applicable to influences among the nodes in a network with an input-output relation. 

Aside from the methodological contributions toward answering the two questions we raised 
in the beginning, we find that, although overall our lOF is strongly correlated with the number 
of citations/publications, there are outliers in the correlation plots between the lOF and the 
number of citations/publications. Those outliers have either much stronger (i.e. , Statistical 
Physics) or much weaker (i.e. , Relativity), influences on other fields when compared with 
the number of citations/publications in them. It seems to us that these outliers are intuitively 
understandable and plausible. Similar meaningful outliers have been identified in relational 
studies, in which influences on and from individual fields are considered. For example, we 
found that 03 (QuanMech) is closely related to 37 (Mechanical control of atoms) while direct 
citations between the two are not significant. This demonstrates that our network-based analysis 
can go beyond studies based on direct statistics using the number of citations/publications. 

We present the main idea and the formulae in the next section. After that, in §, we use 
a closed-system analysis to investigate relationships between the subfields of physics using 
records from the American Physical Society (APS) of published journals articles and discuss 
the validity of the information revealed by our analysis. A more general discussion of the 
validity of our closed-system input-output analysis can be found in §. Discussions of some 
technical issues of our method and some additional results are reported in the Supplementary 
Materials. 


Results 

Modified closed system input-output analysis(MCSIOA): the core idea. We will first sum¬ 
marize the open-system LIOA in economics and then modify it to make it applicable to closed 
systems. In fact, the first input-output model[I2] that Leontief proposed was a closed-system 
model and only later he and the vast majority of his followers turned to an open-system analysis. 
Let us assume the whole economy has N sectors and each sector is a component such as Agri¬ 
culture, Mining, Textiles etc. Starting from a matrix x = representing the number or 

monetary value of all products of sector i that are required for producing the products of sector 
j, one defines a matrix of direct input-output coefficients 


x. 


U- = 

^ Xi ’ 


( 1 ) 


where x^^. With these elements bp we obtain 


X* = ^ btjX^ ^X = BX, 
j 


( 2 ) 


meaning that X is an eigenvector of matrix B with eigenvalue 1, the largest eigenvalue of matrix 
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B. For simplicity, we call the eigenvector corresponding to the largest eigenvalue the largest 
eigenvector. 

If we separate the final demand sector, say sector N, from the other sectors of an economy, 
and denote it as x)^ = y^, we have 


Af-l 

= +y^ ^ = (^ - Y, (3) 

i=i 

where is what remains of vector X after its iVth element is removed and, similarly, 

is the matrix B after its iVth row and iVth column are removed. The inverse matrix is 
known as the Leontief inverse, and is denoted as L = (l — . L is also called the full 

input-output coefficient matrix because it takes into account not only the direct coefficients but 
also the indirect ones. This can be observed even more clearly if we rewrite L as follows: 

AX = LA|/ = 5^(i?(-^))"A|/, (4) 

n 

assuming Ay is known. 

In addition to the question of the system’s response to a change in the final demand, LIOA 
can be applied to measuring the relative importance of sectors and the influences among them. 
This is called the Hypothetical Extraction Method (HEM) [13]. The basic idea is that for a given 
Ay (without the previous jth element), one can define 

A(-^')X = L^-^'^Ay = (1 - B^-^-^'>) Ay, (5) 

where is what remains of matrix B after both the jth and the Ath (j 7 ^ N) row and 

column are removed. One then compares AX with A^'^^X. If they are quite different (or, 
specifically, the kth element differs), then the jth sector is essential to the economy (to the kth 
sector). One may say that the importance of sector j to the economy and to each other sector is 
concealed in the difference between L and 

Due to the difference in the time scales of producing next-generation labor and manufac¬ 
turing other products, it is plausible to separate the sector of final consumers from the other 
industrial sectors. However, in principle the sector of final consumers is an intrinsic ‘manu¬ 
facturing’ sector of the economy because it provides labor and accepts products. Eet us now 
turn to the closed-system approach to input-output analysis, in which it is neither necessary nor 
possible to treat one sector as external to the system. 

Thus, the linear equation technique is clearly no longer applicable to our closed-system 
input-output analysis, but we may study the largest non-negative eigenvector of B and B^~^'> as 
long as those matrices have such an eigenvector. Ideally, we would also like to expect that such a 
largest non-negative eigenvector is unique for a given matrix B or B^~^\ However, in principle 
this is not necessarily true although this is almost always the case in the following empirical 
analysis. We introduce a robust analysis by adding a perturbative term to matrices B and B^~^^ 
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to make the values all positive just as is used in the PageRank algorithm. Details are provided 
in the Supplementary Materials. For simplicity of notation, we still call those perturbed positive 
matrices B and B^~^\ of which each has a unique all positive largest eigenvector. 

We then consider the difference between the eigenvalues and eigenvectors of and B. 
This relies on another interpretation of Eq. (2): the vector X can be regarded as the specific 
combination of products that, when supplied to the economy, results in one hundred percent 
of the input becoming the output, i.e. , the economy operates at full efficiency because the 
corresponding eigenvalue is 1 and it is the maximum eigenvalue. Similarly, the maximum 
eigenvalue and the corresponding eigenvector of are associated with the highest efficiency 
and the corresponding combination of products for the economy without sector j. Imagine the 
case in which sector j has hardly any connections to other sectors, i.e. , the values in the jth 
row and/or column are very small compared with other elements of B. Denoting the largest 
eigenvalue of matrix by then, will be very close to 1. Otherwise, when 

elements in the jth row and column are relatively large, will be much smaller than 1. The 
fact that all eigenvalues of the matrix (and also all must be less than or equal to 1 

in magnitude will be shown in the Supplementary Materials. 

Therefore, we propose using the lOF defined by 

SJo = 1 - A'-« (6) 


to measure the relative importance of sector j. This answers the first question we raised in this 
paper. 

Let us now attempt to provide an answer to the second question. Intuitively, the influence 
of sector j on each of the other sectors is concealed in the difference between X and | 
which are respectively, the largest eigenvector of B and B^~^\ Thus, we propose the following 
quantity, which we call lOInfluence (lOI), to provide a comparison between X and | 


Ai 


(A; I X) - A(-^) (A;|A(-^)> 

WW) 


(7) 


where |AA-?)^ is the largest eigenvector of matrix B^~^^ and \k) is the column vector with all 
zeros except for the A;th element. In a sense this eigenvector represents the best combination of 
products when sector j is removed from the economy. The amount of total outputs of the new 
system without section j intuitively should be A Ai) times the original total output, thus the term 

^ j. Note that this definition of Ai is based on intuition and has not been fully 

justified. 

Eq. (6) and Eq. (7) are the two core formulae in this paper. All of the calculations in 
the following sections are based on these two formulae. Within the general framework of 
the closed-system input-output analysis sketched above, we will now answer the two central 
questions raised at the beginning of this manuscript. 
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MCSIOA applied to relationships between sublields in physics: the results 

The above closed-system input-output analysis is now applied to relative importance of and 
influences among scientific fields. We consider subfields of physics as a case study. 

Construction of the closed Input-Output system. We use data regarding all papers published 
in APS (American Physical Society) journals between 1976 and 2013. A total of 390208 papers 
have Physics and Astronomy Classification Scheme (PACS) codes. PACS is a classification 
system of subfields in physics consisting of 6-digit 4 to 5-level codes. We will, however, use 
only the first 3 levels. There are 10(resp. 78 and 937) PACS codes at level 1 (resp. level 2 
and level 3). APS papers come with several author-defined PACS codes. The rich information 
encoded in such a classification system has been discussed in e.g. [14]. 

To establish the input-output system of subfields, we regard each PACS code as a sector. A 
citation received by a papers in one sector (PACS code i) from a paper in another sector (PACS 
code j) is modeled as an input from sector i to sector j. We then count the papers and citations 
within the APS data. For example, if one paper p published in sector j cites a paper q published 
in sector i, there is a link from i to j. Each paper may have multiple PACS codes. For instance, 
if in a time window t, a paper p having Pp PACS codes, one of which is j, and cites Cp papers, 
one of which is q, which has Pg PACS codes one of which is i, then the contribution towards 
the input-output relation from i to j due to the citation from paper p to paper q is 

(P ^ 7) = 75-^- (8) 

The time window we use in this study is five years. We provide an example of the weighted 
network in Fig.l, where a citation, as in Fig.lA, from Paper A to Paper B is converted into a 
network, as in Fig.IB, and a matrix representing the weighted network, as in Fig. 1C, following 
Eq. (8). Input-output networks/matrices of PACS codes can be established at various 

levels in this way. In FIOA in economics, X* = Xp the total input to an economic sector 
equals to the total output from that sector. Here it is not necessarily true that the citation count 
from the field is the same as the citation count to the field. Fuckily for us, we do not need this 
to be the case for the analysis to work. 

The relative importance of subfields and its evolution. With the set of input-output net¬ 
works/matrices and matrices B) of PACS codes for different time periods, we first 

discuss the relative importance of subfields and how this evolves. 

First, we examine the correlation between the relative importance, as measured by the lOF, 
and by the number of times each subfield is cited. In Fig. 2A, we compare the lOF rankings 
of PACS codes with the rankings obtained from the total number of citations received by all 
papers with corresponding PACS codes. As shown in the figure, although the two rankings are 
correlated, there are some outliers: some fields, such as 05 and 02, have relatively higher lOF 
rankings (smaller y values, toward the top in the figure) whereas others, such as 04 and 98, have 
higher citation rankings (smaller x values, toward the right in the figure). 
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A 


B 



Paper A (DOI:PhysRevLett. lOl. 170504) 

PACS: 03.67. Lx;42. 50.-p 

Reference (total 21) 

1. Paper B (DOI:PhysRevLett. 85. 2208) 

PACS: 03. 67. Lx; 32. 80. P j; 32. 80. Rm 

2 . . . . 



f rom\ 

42. 50. -p 

32. 80. Pj 

03. 67. Lx 

32. 80. Rm 

42. 50. -p 

0 

0 

0 

0 

32. 80. Pj 

1/126 

0 

1/126 

0 

03. 67. Lx 

1/126 

0 

1/126 

0 

32. 80. Rm 

1/126 

0 

1/126 

0 


FIG. 1: Citations among all APS papers are converted into an input-output network/matrix of PACS 
codes. (A) A fictitious network in which there are relatively large differences between the direct and 
indirect influences between nodes. (B) A piece of the real APS citation network: paper A with PACS 
codes 03.67.Lx and 42.50.-p cites paper B with PACS codes 03.67.Lx, 32.80.Pj and 32.80.Rm. (C) In 
the corresponding input-output network of PACS codes, directed links from the PACS codes of Paper B 
to the PACS codes of paper A are added to the network of PACS codes following the citations from paper 
A to paper B. (D) A matrix version of B with numbers calculated using Eq. (8). 


PACS 05 is the field of “Statistieal physies, thermodynamies and nonlinear dynamieal sys¬ 
tems” (StatPhys for short). From the correlations for 2009 - 2013 shown in Fig. 2, we see 
that 05 has a large influence on other fields of physics relative to the number of citations it 
received, and this has been the case for this field for the past few decades (See Fig. 3 in the 
main text and Fig. S2 in the Supplementary Materials). This means that not only were papers 
in StatePhys (05) cited directly by many papers in other fields, but that 05 plays an important 
indirect role: Many other influential papers cited those papers who directly cited papers in 05 
and so on. This picture of the importance of StatPhys is consistent with our own intuition that, 
in recent years, concepts, models and methods from statistical physics have been extensively 
used in other scientific fields. 

Similar but slightly different behavior can be observed for PACS 02, “Mathematical methods 


8 
















FIG. 2: (A)Correlation between the citation rankings and lOF rankings of subfields during the years 
between 2009 and 2013 is plotted in this figure. The y-axis (x-axis) represenfs rankings based on fhe lOF 
(fofal cifafions received) of each level-2 PACS code. In fhe region above fhe diagonal line along which 
fhe fwo rankings are equal, PACS codes have higher (smaller y values, foward fhe lop) lOF rankings lhan 
cilalion rankings, as is fhe case for 05 (SlalPhys) and 02 (MalhPhys). In fhe lower region, fhe cilalion 
rankings of fhe PACS codes are higher (smaller x values, foward fhe righf) fhan fheir lOF rankings, as is 
fhe case for 04 (Relalivily) and 98 (Slellar). (B) To provide a comparison, we plol fhe number of cifafions 
received versus fhe number of publicafions. This considers only fhe direcf conneclions befween fields. 
The fwo numbers are highly correlaled and Ihose fields lhal sland oul in (A) are no longer exceplional in 
Ihis figure. See Supplemenlary Materials for Ihese figures of sub fields al olher levels. 

in physics”. It has a relative low lOF ranking and total number of eitations. However, eonsider- 
ing its low number of eitations, its lOF seore is outstanding. This means that the total number 
of eitations reeeived direetly by this field is not very high, but its indireet effeet makes this field 
more important than the number of reeeived eitations suggests. 

PACS 04 and 98 are among the fields that have higher eitation rankings than their lOF rank¬ 
ings. This result does not imply that those fields are less important: it just means that they have 
smaller influenee on other fields. It is understandable that eaeh of these fields are more like a 
elosed field of their own. Many physieists may not need to know mueh about stellar systems 
(98) to eonduet their researeh. 

We performed a similar eomparison between the eitation rankings and publieation rankings 
of the subfields. We observed from Fig. 2B that these rankings are better eorrelated than the 
previous pair of rankings, so that, generally speaking, the outliers in Fig. 2B stand out less. 
Consider, for example, the subfields 04 and 05 in the two figures: they are quite different 
in Fig. 2A while they are both on the diagonal line in Fig. 2B. We want to emphasize that 
by ineluding indireet eonneetions, lOF rankings provides some more insightful and valuable 
information than eitation rankings and the publieation ranking (at least in this ease) beeause the 
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latter only consider direct connections. 

There are other outliers in the correlation figure, but we focused on some fields with which 
we have personal knowledge. The complete data set is provided in the Supplementary Materials 
for further examination. The results on parallel studies on level-1 and level-3 sub fields are also 
reported in the Supplementary Materials. 

The same plot can be used to reveal the time evolution of the relative importances of the 
subfields. In Fig. 3, we plot the values, instead of the rankings, of the lOF and citation counts 
of all subfields between 1996 and 2011. The trajectories of a few subfields (05, 03, 04, 32, 61, 
68, 74, 78, 82, 98) are highlighted. The following two facts were interesting and surprising to 
us. First, for a very long time (before the year 2007) 05 (StatPhys) had a higher lOF than 03 
(QuanMech), and second, that several subfields of 60 (Condensed matter I) and 70 (Condensed 
matter II) have decreasing lOFs even in cases of increasing citation counts. For example, the 
citation count of 74 (Superconductivity) increases while its lOF decreases. See Supplementary 
Materials for the figures of sub fields at other levels. 


o 





04 Relativity 
ilar 


05 StatPhys 


03 QuanMech 


1000 2000 3000 4000 


5000 6000 


Total Citation 


FIG. 3: lOFs are plotted against citation counts for all subfields during each year from the year 1991 
to 2011. Evolutions of sub fields 05, 03, 04, 32, 61, 68, 74, 78, 82, 98 are highlighted with their 2011 
positions marked on this figure. Full data and a list of the top 20 at all levels can be found in the 
Supplementary Materials. 

Influences among the subfields. For a given subfield j, we calculate A;^. This describes how 
much the number of citations received by the subfield k changes, directly and indirectly, if 
subfield j is removed from the field of physics. Subfield k relies strongly on subfield j when 
A|, -C 0 and sub field k can be regarded as a substitute for subfield j when A;[ S> 0. 
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In Fig. 4 we use two speeifie subfields - 98 (Stellar systems) and 03 (QuanMeeh) - in 
the time interval 2004-2008 as examples. We see that there is a large difference between the 
influential sets, according to lOI and citation counts for subfield 03, while the difference is 
smaller for subfield 98. It is also important to note that, according to Fig. 4A, the top 10 fields 
with the greatest influence on 98 are generally in astronomy, relativity, stars, etc., which makes 
intuitive sense. This observation supports our intuitive definition of Al. From Fig. 4B, we 
see that, if, for example, one wants to boost the development of 03, then it might be necessary 
to increase funding for 37 (Mechanical control of atoms etc.) and 39 (Instrumentation and 
techniques for atomic and molecular physics, later partially merged into 37), which are not in 
the top five fields cited from 03. A complete map of all the physics subfields at all levels is 
provided in the Supplementary Materials. 


98 03 



Citation ranking Citation ranking 

FIG. 4: Influences among the level-2 subfields. (A) For some fields (e.g., 98-Stellar systems), the influ¬ 
ential rankings according to the lOI and citation counts are strongly correlated. It is also important to 
notice that for 98 the top 10 closely related fields, most of which are in astronomy, relativity, stars etc., as 
indicated above, intuitively make sense. (B) For some fields (e.g., 03-QuanMech), there is a large differ¬ 
ence between the two rankings. For example, while citation counts suggest that 37 (Mechanical control 
of atoms etc.) and 39 (Instrumentation and techniques for atomic and molecular physics, later partially 
merged into 37) do not significantly depend on 03 (QuanMeeh), the lOI analysis suggests that 37 and 
39 do in fact depend very much on 03. See Supplementary Materials for the figures of the remaining 
subfields at all levels. 

Conclusion and Discussion 

In this paper we developed a method of elosed-system input-output analysis and used it to study 
influenees between subfields of physios using APS publication data. We found that by including 
both direct and indirect connections, our closed-system input-output analysis revealed deeper 
relationships among subfields than could be observed by directly looking at the numbers of 
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citations and publications. This method provides an innovative approach to answering the two 
questions raised at the beginning of the paper: Given a set of fields, which is more influential 
thus should be supported preferentially? Given a specific priority, what other fields are neces¬ 
sary foundations for the targeted field and thus also need to be prioritized? When combined 
with time-series data, this method can also be used to track the development of the influences 
between scientific fields. 

Furthermore, the method proposed and developed in this work can be applied back to studies 
of economic systems and more generally to any type of networks with input-output relationships 
between the nodes. For example, a new type of influence factor of and among journals can be 
established based on this method. With more and more data available in this era of big data, it 
will be interesting to see more applications of this method. In addition, it will also be interesting 
to see a comparison between our results and the results from applying the PageRank algorithm 
to the same problem because both approaches consider indirect connections. 
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In these Supplementary Materials, we provide some extra explanation of our methods and some 
additional results which are mostly tables and figures, sometimes together with the data that are too large 
or too long to be included in the main text. 

1 Further details on methods and materials 

Uniqueness of the largest eigenvector of B (and also is a subtle and important technical issue 

of the analysis proposed in this work. Here we provide some further discussion on this issue. 

1.1 Uniqueness of the largest eigenvector of B and 

The Perron-Frobenius theorem of positive matrices, of which all elements are positive, states that 
each positive matrix has a unique eigenvector containing only positive values and the corresponding 
eigenvalue is the maximum real-value eigenvalue. Therefore, positive matrices have all of the good 
properties that we expect matrix B and B^~^^ to have. However, our matrices B and are not pos¬ 
itive but only non-negative matrices. The Perron-Frobenius theorem of non-negative matrices, of which 
all elements are non-negative, claims that each irreducible non-negative matrix has a unique eigenvector 
containing only positive values and the corresponding eigenvalue is the maximum real-value eigenvalue. 
Note that matrix B and B^~^'> are not necessary irreducible. Due to this, the largest eigenvalue and the 
corresponding largest eigenvector might not be unique. Of course, it might be the case that the largest 
eigenvector is still all positive and it is unique. Thus, we performed the following additional analysis on 
matrix B and all B^~^\ 

First, we check the existence and uniqueness of this largest non-negative eigenvector in our practical 
calculations. After removing all sectors with no output (X^ = 0) from matrix X to define matrix B, we 
find that for all cases, such a largest non-negative eigenvector exists and it is unique for matrices B and 
B^~^\ However, although practically it is the case in our analysis, we can not guarantee that for other 
systems matrices B and always have this property. 

Second, we check for irreducibility of matrices B and B^~^'> as that is required in the Perron- 
Frobenius theorem of non-negative matrices. One way to do that is to examine the strong connectivity of 
the graph corresponding to B and B^~^\ We have done so in this work using the non-recursive Tarjan’s 
algorithm with Nuutila’s modifications provided in the networkX software[S15] and find that at all PACS 
levels the strongly connected components of B and cover more than 96% of all citations. At level 
1, for our 5-year period analysis, the whole network B is strongly connected already and all the corre¬ 
sponding networks of are also strongly connected. At level 2, the strongly connected subgraph of 
B and B^~^\ denoted as B and B^~^\ keeps 99% of the citations in the whole network. At Level 3, 
the citation network is relative sparse, so about 100 sectors are excluded but the remaining strongly con¬ 
nected component keeps about 96% of the citations. These large percentages means that even sometimes 
matrices B and B^~^'> might not be irreducible, they are very close to irreducible matrices. 
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In principle, we can always identify and then focus on only the strongly connected B and 
This procedure is, however, quite demanding. Here we suggest to use a perturbative analysis. 

Third, we use a perturbative analysis to, in a sense, calculate the largest non-negative eigenvector 
directly from B and instead of from B and B^~^\ The following idea of this perturbative analysis 
comes from the PageRank algorithm and is quite straightforward: We want to compare the calculated 
largest eigenvectors of matrix B and B^ — — a) B + aE with a as a numerical value being very 

close to 0 and matrix E is the matrix with every element being 1. According to the Perron-Frobenius 
theorem of non-negative matrices, because B might not be irreducible, the calculated largest eigenvector 
of B might only be one of the a few of eigenvectors corresponding to the eigenvalues with the same 
maximum magnitude, while the calculated largest eigenvector of Ba, since it is a positive matrix, is 
unique and corresponds to the largest eigenvalue, which is also unique. Now when we compare those 
two calculated eigenvectors, denoted as respectively | A (B)) and | A (Ba)), it can be the case that the two 
vectors are rather different or that they are quite similar. Since |A {Ba)) is unique but |A {B)) is not, in 
principle, the two vectors can be quite different even with a being very close to 0: There are multiple 
|A (i?))s and they live in a multidimensional largest eigenvector space. Even with a tiny a. the dimension 
of the the largest eigenvector space collapses into a one-dimensional one. This change of dimensions has 
a large effect unless the largest eigenvector space of B is already one-dimensional. Therefore, we can 
find out whether the largest eigenvector space of B is one-dimensional by simply looking at whether the 
following expression give a value numerically very close to 1 or not, 

U = {X{B)\X{Ba)){a^0). (1) 

We also want to compare eigenvectors of B and Ba since if we want to use the largest eigenvector 
of Ba then ideally we want this largest eigenvector to be close to the one from B. Thus we define 

V = (A(H) |A(H„))(af«0). (2) 

Note that ideally we expect that V to be close to 1 and U to be slightly smaller than 1. Theoretically, 
this holds for arbitrarily small a since introducing this a breaks the multiplicity of the largest eigenvalue 
in magnitude into a simple largest eigenvector. However, in numerical calculations, there is always a 
problem of finite accuracy so we use a simple example in Fig. SI to estimate the sufficiently large value 
of a. Our numerical calculation is performed with the Scipy [S16] and specifically using the ARPACK 
linear algebra package [SI7] provided by the Scipy in Python. Remember that we do not want this value 
to be too large such that V becomes too small. 



Figure SI | (A) One example graph, which is not strongly connected. (B) V > W for a wide regions of 
values of a except the case of a < 0.00001, where U >V. 

The graph in Fig. SI is not strongly connected and the corresponding adjacency matrix has multiple 
largest eigenvectors, i.e. |A {B)) is not unique. The calculated largest eigenvectors of B (B^~^'>) and 
Ba (Ba are compared and we find that for almost all values of a, |A (Ba)) is closer to |A (B)) 
than IA (B)) except when a < 0.00001. This means that even when the original matrix B has multiple 
|A {B))s introducing this extremely small a make |A (Ba)) to be unique and very close to the unique 
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largest eigenvector of | A (-B))- From what we have observed from this example, we use Ba and Ba 
with a = 0.00002 instead of directly using matrix B or in all of our analysis presented in the main 
text. Again we want to emphasize that we intend to work on B and B^~^\ finding which is however very 

f-j) 

demanding, thus we instead turn to the much less demanding Ba and Ba ■ 

With such an extremely small a, we regard this to be a simple technical issue without changing 
much the properties of the desired largest eigenvector of B and B^~^\ Due to this replacement, the core 
formulae Eq. (7) and Eq. (8) in fact need to be adjusted accordingly. However, since this a is extremely 
small and we do this for pure technical reasons, we regard the eigenvectors and eigenvalues to be those 
from matrix B and B^~^'> although they are not. 

1.2 Proof of A <1 

In this section, for simplicity, we assume that B and is irreducible. If B^~^^ has one eigenvalue, 
whose magnitude is larger than 1, then the corresponding eigenvector should also be the eigenvector of 
matrix B (by adding simply 0 at the Wth component), thus matrix B would have eigenvalue with mag¬ 
nitude larger than 1. This conflicts with 1 being the largest eigenvalue of B. Therefore, the magnitude of 
each eigenvalue of B^~^'> must be less than or equal to 1. 

2 Additional results 

2.1 Tables and figures for infiuences of subfields of physics 

In Eig. 2 of the main text, we report correlations between ranks of level-2 subfields based on our lOE 
and number of citations. Here we provide the same correlation plots on level-1 and level-3 subfields of 
physics. 




Figure S2 | (A) Correlation between citations ranks and the lOE ranks of level-1 subfields is plotted 
in this figure. PACS codes such as 00(General Physics) have better lOE ranks than their citation ranks 
while citation ranks of PACS codes such as 10(The Physics of Elementary Particles and Eields) are better 
than their lOE ranks. (B) Correlation between citations ranks and the lOE ranks of level-3 subfields 
is plotted in this figure. Eields 67.85(Ultracold gases, trapped gases) and 78.67(Optical properties of 
low-dimensional, mesoscopic, and nanoscale materials and structures) have better lOE ranks than their 
citation ranks while fields 71.45(Collective effects) and 98.80(Cosmology) have better citation ranks than 
their lOE ranks. 

We observed that there are again some outliers in the level-1 and level-3 correlation plots. At level- 
1, fields 00(General Physics) has better lOE ranks than their citation ranks while fields 10(The Physics 
of Elementary Particles and Eields) has better citation ranks than their lOE ranks. At level-3, fields 
67.85(Ultracold gases, trapped gases) and 78.67(Optical properties of low-dimensional, mesoscopic, 
and nanoscale materials and structures) have better lOE ranks while fields 71.45(Collective effects) and 
98.80(Cosmology) have better citation ranks. Eield 30 at level-1 is rather special: Overall it has low 
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lOF while it still have better lOF rank than the citation rank. This means that the influence of 30 can be 
underestimated if judging from only number of papers and citations in the field although it is indeed not 
that influential. 

To see better the time evolution of influences of subfields, we present an animated version of Fig. 3 
in the main text. 

Figure S3 | Evolution of influences of level-2 subfields between 1991 and 2011. Certain PDF reader 
such as the Acrobat PDF Reader might be needed to show to animation. 


We also include a table of the top 20 most influential level-2 subfields at the year of 2011. 

Table SI | Top 20 most influential level-2 subfields at the year of 2011. Full data can be downloaded at 

subfield_list_level2.txt. 


rank 

pacs code 

subfield 

lOF 

1 

03 

Quantum mechanics, field theories, 
and special relativity 

0.0334107413899 

2 

05 

Statistical physics, thermodynam¬ 
ics, and nonlinear dynamical sys¬ 
tems 

0.032334689129 

3 

42 

Optics 

0.0266783205433 

4 

71 

Electronic structure of bulk materi¬ 
als 

0.026051707365 

5 

73 

Electronic structure and electrical 
properties of surfaces, interfaces, 
thin films, and low-dimensional 
structures 

0.0244555083998 

6 

75 

Magnetic properties and materials 

0.0233384148198 

7 

78 

Optical properties, condensed- 
matter spectroscopy and other 
interactions of radiation and 
particles with condensed matter 

0.0178594554773 

8 

61 

Structure of solids and liquids; 
crystallography 

0.017412083493 

9 

68 

Surfaces and interfaces; thin films 
and nanosystems (structure and 
nonelectronic properties) 

0.0141285788655 

10 

74 

Superconductivity 

0.0140740641051 

11 

72 

Electronic transport in condensed 
matter 

0.0134113056464 

12 

64 

Equations of state, phase equilibria, 
and phase transitions 

0.0131746830176 

13 

11 

General theory of fields and parti¬ 
cles 

0.0124063117582 

14 

47 

Fluid dynamics 

0.0121936938139 

15 

87 

Biological and medical physics 

0.0120704923922 

16 

02 

Mathematical methods in physics 

0.0118169903456 

17 

12 

Specific theories and interaction 
models; particle systematics 

0.011308323537 

18 

81 

Materials science 

0.0104836041511 

19 

82 

Physical chemistry and chemical 
physics 

0.00973865082348 

20 

67 

Quantum fluids and solids 

0.00948140209385 


We also include here evolutions of the top 20 subfields at each level for the years 1991, 2001 and 
2011. A text file of the full list of the subfields at each level for all the years between 1991 and 2011 is ac¬ 
cessible through the following: Level-1 (2, 3) list can be downloaded at subf ield_list_levell. 
txt(subfield_list_level2.txt, subfield_list_level3.txt). 

To look more closely into the finer structure, we analyzed the relative importances of the level-3 
subfields and then plotted the level-3 subfields according to their level-1 classifications. Result of this 
analysis is presented using multi-layer pie charts in Fig. S7. Each level of the charts from inner to outer 
layers represents the ordered, from the most to the least influential ones, 25% subfields. In each layer, we 
use different colors to represent the level-1 PACS codes of the subfields. Here level-1 PACS codes are 
regarded as the major branches of physics. In this way, we can see how each region is composed from 
major branches of physics and how this composition changed over time. 

The pie chart for 1991 shows that the top quartile consisted mostly of Condensed Matter (PACS 60 
and 70), General Physics (PACS 00) and Elementary Particles and Eields (PACS 01), with small contri- 
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Table S2 | Top 20 most influential level-3 subfields at the year of 2011. Full data can be downloaded at 

subfield_list_level3.txt. 


rank 

pacs code 

subfield 

lOF 

i 

03.65 

Quantum mechanics 

0.0148953211316 

2 

42.50 

Quantum optics 

0.0147084409465 

3 

03.67 

Quantum information 

0.0131002568588 

4 

05.45 

Nonlinear dynamics and chaos 

0.0105192533981 

5 

03.75 

Matter waves 

0.00982284895269 

6 

42.65 

Nonlinear optics 

0.00822584929776 

7 

71.10 

Theories and models of many- 
electron systems 

0.00779028463258 

8 

73.20 

Electron stales at surfaces and inter¬ 
faces 

0.00771114964267 

9 

05.40 

Fluctuation phenomena, random 
processes, noise, and Brownian mo¬ 
tion 

0.00769361189889 

10 

75.10 

General theory and models of mag¬ 
netic ordering 

0.00728881536532 

11 

78.67 

Optical properties of low¬ 
dimensional, mesoscopic, and 
nano.scale materials and structures 

0.00695251123923 

12 

75.30 

Intrinsic properties of magnetically 
ordered materials 

0.00676542990058 

13 

05.30 

Quantum statistical mechanics 

0.00670483397365 

14 

71.15 

Methods of electronic structure cal¬ 
culations 

0.00668113108307 

15 

75.50 

Studies of specific magnetic materi¬ 
als 

0.00617769198 

16 

64.70 

Specific phase transitions 

0.00612260589561 

17 

67.85 

Ultracold gases, trapped gases 

0.00611556368295 

18 

64.60 

General studies of phase transitions 

0.00609626427723 

19 

89.75 

Complex systems 

0.00594258830918 

20 

74.25 

Properties of superconductors 

0.00592012932517 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 



Figure S4 | Evolutions of all level-1 subfields for the years 1991, 2001 and 2011. Field 00 (General 
Physics) moved from the third in 1991 to the top in 2001 and stayed at the top till 2011. The ranks are 
quite stable after the year 2001. Full data can be downloaded at subf ield_list_levell. txt. 



Figure S5 | Evolutions of top 20 most influential level-2 subfields for the years 1991, 2001 and 2011. 
Field 03 (QuanMech) has been increasing during the whole period and it has reached the top at 2011, in 
fact around 2009 as indicated in Fig. 3 in the main text. We also observed that overall influences of fields 
in 60 and 70 are stable or slightly decreasing compared to the year 1991. Full data can be downloaded at 

subfield_list_level2.txt. 
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Figure S6 | Evolutions of top 20 most influential level-2 subfields for the years 1991, 2001 and 2011. At 
this level of details, we can see even more clearly that overall influences of fields in 60 and 70 are stable 
or slightly decreasing compared to the year of 1991 while fields in 03 and 05 are increasing. Full data 
can be downloaded at subf ield._list_level3 . txt. 

butions from Atomic and Molecular Physics (PACS 30), Electromagnetism, Optics, Classical Mechanics 
(PACS 40) and Interdisciplinary Physics (PACS 80). When the 2001 and 2011 pie charts are compared 
with the 1991 pie chart, we see General Physics and Interdisciplinary Physics become larger parts of the 
core region while Atomic and Molecular Physics shrinks. We can also look at the change in the distri¬ 
butions of a particular color between all four quartile sets. For example. Interdisciplinary Physics moves 
steadily toward the center, the more influential level whereas large portions of Geophysics, Astronomy 
and Astrophysics (PACS 90) migrate from the second quartile to third quartile most influential subfields. 



Figure S7 | Multi-level pie charts 1991, 2001 and 2011 showing the composition of major branches of 
physics within each of the four 25% regions. Colors represent the level-1 PACS codes, which are for 
the major branches of physics here. From the years, in the core region of the first quartile influential 
subfields. General Physics (00) and Interdisciplinary (80) have taken a larger part while Atomic and 
Molecular Physics has been shrinking. 


2.2 Figures for influences among subflelds of physics 

In Fig. 6 in the main text, we choose two specific subfields at level-2, 98 and 03, and present subfields 
that are closely related to these two subfields. Here we report influences among all the level-2 subfields 
and also among all level-1 and level-3 subfields. We use a heatmap for this purpose: the size of each 
circle represents number of citations from the column subfield to the row subfield while the color in the 
circle corresponds to our lOI from the row to the column subfield. The value of the number of citations 
has been renormalized with respect to the row subfield. 

First, we note that it is not always the case that the order of degree of influences is the same as 
the order of citation counts. Second, we found that a large numbers of lOIs are positive while a few 
of them are negative. We interpret the positive ones, which means that when field i is removed from 
the whole discipline outcomes of field j decreases, to be the relying-on relation and the negative ones, 
which means that outcomes of field j increases when field i is removed, to be competitive or substitutive 
relation. A detailed examination of all those relations, which has not been done in this work, should 
be interesting. Third and finally, we also observed that in level-2 and level-3 heatmaps, overall there 
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are relatively stronger correlations among the subfields within the same categories (the diagonal block 
elements) than that among the subfields across categories (the off-diagonal block elements). This means 
that the boundaries between different categories, represented by the hierarchical structure of the PACS 
codes, indeed meaningfully describe closely the interconnections among subfields. 



° • • • • 
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Figure S8 | Given a level-1 column subfield, its relations to other subfields in the row are color coded, as 
deeper color represents stronger influence. Size of the circles is proportional to the number of citations 
received by the row field from the column field. Full data can be downloaded at heatmap_levell. 
txt. 



Figure S9 | Given a level-2 column subfield, its relations to other subfields in the row are color coded, as 
deeper color represents stronger influence. Size of the circles is proportional to the number of citations 
received by the row field from the column field. Full data can be downloaded at heatmap_level2 . 
txt. 

The level-3 heat map has 940 x 940 entries, so it is too big to show the figure in great detail. Therefore, 
for this map, we also provide a png file for downloading relation_heatinap_level3 . png and a 
data file in excel format, which is accessible via heatmap_level3 . txt (also heatmap_levell. 
txt and heatinap_level2 . txt for the level-1 and level-2 heatmaps respectively). 


7 

























3 Supplementary References 

[15] Aric A. Hagberg, Daniel A. Schult and Pieter J. Swart, ’’Exploring network structure, dynamics, and 
function using NetworkX”, in Proceedings of the 7th Python in Science Conference (SciPy2008), 
Gael Varoquaux, Travis Vaught, and Jarrod Millman (Eds), (Pasadena, CA USA), pp. 1115, Aug 
2008 

[16] Jones E, Oliphant E, Peterson P, et al. SciPy: Open Source Scientific Tools for Python, 2001-, 
http://www.scipy.org/ [Online; accessed 2015-02-26]. 

[17] ARPACK SOETWARE. (available at http://www.caam.rice.edu/software/ARPACK/). 


8 



